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Audio signal generation 



The invention relates to generating an output audio signal based on an input 
audio signal, and in particular to an apparatus fot supplying an output audio signal, 

Erik Schuijers, Werner Oomen, Bert den Blinker and Jeroen Breebaart, 
5 "Advances in Parametric Coding fox High-Quality Audio 9 ', Preprint 5852, 1 14th AES 

Convention, Amsterdam, The Netherlands, 22-25 March 2003 disclose a parametric coding 
scheme using an efficient parametric representation for the stereo image. Two input signals 
are merged into one mono audio signal. Perceptually relevant spatial cues are explicitly 
modeled. The merged signal is encoded using a mbno parametric encoder. The stereo 

10 parameters Interchannel Intensity Difference (IE)), the Interchannel Time Difference (FFD) 
and the Interchannel Cross-Correlation (ICC) are quantized, encoded and multiplexed into a 
bitstream together with the quantized and encoded mono audio signal. At the decoder side the 
bitstream is de-multiplexed to an encoded mono signal and the stereo parameters. The 
encoded mono audio signal is decoded in order to obtain a decoded mono audio signal m 1 

15 (see Fig. 1). From the mono time domain signal, a de-correlated signal is calculated using a 
filter D 10 yielding optimum perceptual de-correlation. Both the mono time domain signal m 1 
and the de-correlated signal d are transformed to the frequency domain. Then the frequency 
domain stereo signal is processed with the ICD, JTD and ICC parameters by scaling, phase 
modifications and mixing, respectively, in a parameter processing unit 1 1 in order to obtain 

20 the decoded stereo pair V and r\ The resulting frequency domain representations are 
transformed back into the time domain. 

In the MPEG-4 (ISO/IEC 14496-3 :2002) Proposed Draft Amendment 
(PDAJVf) 2, Section 5.4.6, such a de-correlated signal is obtained by convolutmg/filtering the 
mono-signal with a pre-defined impulse response, 

25 Non pre-published European patent application 02077863.5 (Attorney docket 

PHNL020639) describes the use of an all-pass filter, e.g. a comb filter, comprising a 
frequency dependent delay to derive such a de-correlated signal. At high frequencies, a 
relatively small delay is used, resulting in a coarse frequency resolution. At low frequencies, 
a large delay results in a dense spacing of the comb filter. The filtering may be combined 
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with a band-limiting filter, thereby applying the de-correlation to one or more frequency 
bands. 

An object of the invention is to advantageously generate an output audio 
5 signal on the basis of an input audio signal To this end, the invention provides a device, a 
method and an apparatus as defined in the independent claims. Advantegeous embodiments 
are defined in the dependent claims. 

According to a first aspect of the invention, an output audio signal is generated 
based on an input audio signal, the input audio signal comprising a plurality of input subband 
1 o signals, wherein at least part of the input subband signals is delayed to obtain a plurality of 
delayed subband signals, wherein at least one input subband signal is delayed more than a 
further input subband signal of higher frequency, and wherein the output audio signal is 
derived from a combination of the input audio signal and the plurality of delayed subband 
signals. By providing such a frequency dependent delay in the subband domain, parametric 
15 stereo can advantageously be implemented especially in those audio decoders where the core 
decoder already includes a subband filter bank. Filter banks are commonly used in the 
context of audio coding, e.g. MPEG-1/2 Layer I, n and m all make use of a 32 bands 
critically sampled subband filter. The plurality of delayed subband signals maybe used as a 
subband domain equivalent of the de-correlated signal as described above. In ideal 
20 circumstances the correlation between the plurality of delayed subband signals and the input 
audio signal is zero. However, in practical embodiments, the correlation may be up to 40% 
for acceptable audio quality, up to 10% for medium to high quality audio and up to a 2 or 3 

% for high audio quality. 

fix an embodiment of the invention the output audio signal includes a plurality 

25 of output subband signals. Combining the delayed subband signals and the input subband 
signals in subband domain In order to obtain the plurality of output subband signals is then 
relatively easy to implement. In practical embodiments, a time domain output audio signal is 
synthesized from the plurality of output subband signals in a synthesis subband filter bank. 

In order to obtain an efficient implementation a plurality of delay units is 

30 provided, wherein the number of delay units is smaller than the number of input subband 
signals, and wherein the input subband signals are subdivided in groups over me plurality of 
delays. 

Best audio quality is obtained in embodiments where the delays in the 
plurality of delay units are mouotonically increasing from high frequency to low frequency. 
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In aa advantageous embodiment of the Invention, a complex filter bank is 
used, which is effectively oversampled by a factor of two because for every real input sample 
a complex output sample is generated which consists of effectively two values: a real and a 
complex one, This eliminates the large aliasing components of which the MPEG-1 and 
5 MPEG-2 critically sampled filter bank suffers. 

In an efficient embodiment of generating the output audio signal, a Quadrature 
Mirror Filter ("QMF") bank is used, Such a filter bank is known per se from Per Ekstrand, 
"Bandwidth extension of audio signals by spectral band replication", Proc. 1st IEEE Benelux 
Workshop on Model based Processing and Coding of Audio (MPCA-2002), pp. 53-58, 

10 Leuven, Belgium, November 15, 2002. Fig. 2 shows a block diagram of such a complex 
QMF analysis and synthesis filter bank. The analysis bank 30 divides the signal into N 
complex valued sub bands, which are down sampled internally by a factor of N. A stylized 
frequency response is shown in Fig. 3. The synthesis QMF filter bank 31 takes theN 
complex sub band signals as input and generates a real valued PCM output signal. According 

15 to an insight of the inventors, when a complex QMF filter bank is used, a de-corjelated signal 
can be created which is perceptually very close to the 'ideal* situation* For such a complex 
QMF filter bank, implementations exist which are more efficient than the convolution used in 
MPEG-4 PDAM 2, Section 5.4.6; such a convolution is relatively expensive with respect to 
computational load and memory usage. As an additional advantage, using a complex QMF 

20 filter bank also allows for an efficient combination of parametric stereo and Spectral Band 
Replication ("3BR"). The idea behind SBR is that the higher frequencies can be 
reconstructed from the lower frequencies using only very little helper information. In 
practice, this reconstruction is done by means of a complex Quadrature Mirror Filter (QMF) 
bank In order to efficiently come to a de-correlated signal in the subband domain, 

25 embodiments of the invention use a frequency (or subband index) dependent delay in the 
subband domain, Because the complex QMF filter bank is not critically sampled no extra 
provisions need to be taken in order to account for aliasing . Furthermore, as the delay is 
small, the over-all RAM usage of this embodiment is low. Note that in the SBR decoder as 
disclosed by Ekstrand, the analysis QMF bank consists of only 32 bands, while the synthesis 

30 QMF bank consists of 64 bands, as the core decoder runs at half the sampling frequency 
compared to the entire audio decoder . In the corresponding encoder however, a 64 bands 
analysis QMF bank is used to cover the whole frequency range. 

These and other aspects of the invention are apparent from and will be 
elucidated with reference to the embodiments described hereinafter. 
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d = K ~f (2) 
where d is the delay in samples and /the frequency in radians. 

Preferably, the input subband signals are obtained in a complex QMF analysis 
filter bank, which may be present in a remote encoder, but which may also be present in the 
5 decoder. As the outputs of a complex QMF filter bank are down sampled by a factor of N it is 
not possible to exactly map a desired time domain delay to a delay within each sub band. A 
perceptually good approximation can be obtained by using rounded versions of the delay 
function (2) as described above. As an example, the delay within each subband for KN64 
subbands is shown in Fig. 6. For this particular implementation only 136 complex values 
1 0 have to be stored in order to form the de-correlated signal. Note that for the higher 

frequencies still a delay of a single sub-band sample is employed, although the delay function 
above describes a value of 0 at half the sampling frequency. The delay of a single sub-band 
sample ensures that the signal is maximally de-correlated. 

Fig. 5 shows a blook diagram of a device 50 according to an embodiment of 

15 the invention for generating the plurality of delayed subband signals. The device 50isplaced 
somewhere between the QMF analysis filter bank 30 and the QMF synthesis filter bank 31 
and comprises a plurality of delay units 501, 502, 503 and 504. The delay unit 501 provides a 
one unit delay for aU subbands. A group of higher frequency subbands, e.g. bands 40-64, is 
furnished without further delay to the synthesis QMF filter bank 31. The group of relatively 

20 low fiequency subbands, e.g. bands 0-40, is further delayed in delay unit 502. Part of this 
group, e.g. bands 0-24, is further delayed in delay unit 503 and delay unit 504 (the latter for 
subbands 0-8 only). So effectively an exemplary amount of 4 groups of different delay are 
created, having delays of 1 , 2, 3 or 4 unit delays respectively. The delay expressed in subband 
samples as a function of subband index is shown in Fig. 6. The QMF analysis filter bank 30 

25 is usually present in an audio encoder, although for SBR a smaller M bands analysis QMF 
filter bank is also used in the decoder. 

Fig. 7 shows an advantageous audio decoder 700 according to an embodiment 
of the invention which combines a parametric stereo tool and SBR. A bit-stream demnx 70 
receives the encoded audio bitslteam and derives the SBR parameters, the stereo parameters 

30 and the core encoded audio signal. The core encoded audio signal is decoded using a core 
decoder 71, which can e.g. be a standard MPEG-1 Layer TO. (mp3) or an AAC decoder. 
Typically such a decoder runs at half the output sampling frequency (ft/2). The resulting core 
decoded audio signal is fed to an M subbands complex QMF filter bank 72. This filter bank 
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72 outputs M complex samples pet M teal i*P* samples and is thus effectively over-sampled 
by a factor of 2, as explained before. In aHigh-Frequency (HF) generator 73, higher 
ftequency subbands N4& which are not covered by the core decoded audio signal, are 
generated by replicating (certain parts of) the M subbanda. The output of the high-frequency 
generator 73 is combined with the lower M subbands into N complex sub-band signals. 
Subsequently an envelope adjuster 74 adjusts the repUcatedbigh frequency sub-band signals 
to the desired envelope and an additional component adding unit 75 adds additional 
sinusoidal and noise components as indicated by the SBR parameters. The totalN subband 
signals are furnished to a delays unit 76, which may be equal to the device 50 shown in Fig. 
5, in order to generate the delayed subband signals. The N delayed subband signals and the N 
input subband signals are processed in combining unit 77 in dependence on stereo parameters 
such as the ICC parameter so as to derive N output subband signals for a first output channel 
and « output subband signals tor a second output channel. The N output subband signals for 
the first output channel are fed through the N bands complex QMF synthesis filter 78 to form 
the first PCM output signals for left L. TheN output subband signals for the second output 
channel are fed through the N bands complex QMF synthesis filter 79 to form the first PCM 
output signals for right IL In practical embodiments, N=64 and M=32. 

It should be noted that the above-mentioned embodiments illustrate rather than 
limit the invention, and that those skilled in the art will be able to design many alternative 
embodiments without departing from the scope of the appended claims. In the claims, any 
reference signs placed between parentheses shall not be construed as limiting the claim. The 
word 'comprising' does not exclude the presence of other elements or steps than those listed 
to a claim. The invention can be implemented by means of hardware comprising several 
distinct elements, and by means of a suitably programmed computer. In a device claim 
enumerating several means, several of these means can be embodied by one and the same 
item of hardware. The mere fact that certain measures are recited in mutually different 
dependent claims does not indicate that a combination of these measures cannot be used to 



17.fiPR.2003 13:49 

PHNL030447EPS 



PHILIPS CIP NL +31 40 2743489 



NO. 161 P. 13/19 
013 17.04.2003 14:48:1 



17,04.2003 



CLAIMS: 



1. A device for generating an output audio signal (L, R) based on an input audio 

signal, the input audio signal comprising a plurality of input subband signals (N), the device 
comprising; 

a plurality of delay units (76 4 50L..504) for delaying at least part of the input 
5 subband signals to obtain a plurality of delayed subband signals, wherein at least one input 
subband signal is delayed moie than a further input subband signal of higher frequency, and 

a combining unit (77) for deriving the output audio signal from a combination 
of the input audio signal and the plurality of delayed subband signals. 

10 2. A device as claimed in claim 1 > wherein the output audio signal includes a 

plurality of output subband signals. 

3 . A device as claimed in claim 2, the device further comprising a subband filter 
bank (78, 79) for synthesizing a time domain output audio signal (LJR) from the plurality of 

15 output subband signals. 

4. A device as claimed in claim 1, wherein the input audio signal is a mono audio 
signal and the output audio signal is a stereo audio signal. 

20 5. A device as claimed in claim I, wherein the number of delay units is smaller 

than the number of input subband signals, and wherein the input subband signals are 
subdivided in groups over the plurality of delays units, 

6. A device as claimed in claim 5, wherein the plurality of delay units comprises 

25 a first delay unit (501) for delaying a group of relatively high frequency subbands with one 
subband sample, and at least one further delay unit (50Z..S04) for delaying a group of 
relatively low frequency subbands with at least a further subband sample, 
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7 . A device as olaimed in claim 1, wherein the delay units provide delays which 
are monotonicaily increasing from high frequency to low frequency. 

8. A device as claimed in elate* 1, wherein the subband filter bank is a complex 
5 subband filter bank. 



9. A device as claimed in claim 8, wherein the complex subband filter bank is a 

complex Quadrature Mirror Filter bank. 

10 io. A device as claimed in claim 1, the device further comprising: 

an input (70) for obtaining a correlation parameter indicative of a desired 
correlation between a first channel <L) and a second channel <R) of the output audio signal 
(LJR), and 

wherein the combining unit (77) is arranged for obtaining the first channel (L) 
15 and the second channel (R) by combining the input audio signal and the plurality of delayed 
subband signals independence on the correlation parameter. 

11. A device as claimed in claim 10, wherein the first channel (L) and the second 
channel (R) each comprise a plurality of output subband signals, and wherein the device 

20 further comprises two synthesis subband filter banks (78,79) coupled to an output of the 

combining unit (77) for generating a first time domain channel (L) and a second time domain 
channel (R) on the basis of the output subband signals respectively. 

12. A device (700) as claimed in claim I, wherein the device (700) further 
25 comprises: 

an analysis filter bank (72) of M subbands to generate M filtered subband 
signals on the basis of a time domain core audio signal, 

abighftequency generator (73, 74) for generating a high frequency signal 
component derived from the M filtered subband signals, the high frequency signal 
30 component havingN-M subband signals, where N>M, the N-M subband signals including 
subband signals with a higher frequency than any of the subbands in the M subbands, the M 
filtered subbands and the N-M subbands together forming the plurality of input subband 
signals (N> 
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13, A method of providing an output audio signal (L, R) based on an input audio 

signal, the input audio signal comprising a plurality of input subband signals (N) ? the method 
comprising: 

delaying (501...S04) at least part of the input subband signals to obtain a 
5 plurality of delayed subband signals, wherein at least one inpm subband signal is delayed 
more than a further input subband signal of higher frequency, and 

deriving the output audio signal from a combination of the input audio s ignal 
and the plurality of delayed subband signals. 

0 14. An apparatus (700) for supplying an output audio signal, me apparatus 

comprising: 

an input unit (70) for obtaining an encoded audio signal, 
a decoder (71) for decoding the encoded audio signal to obtain a decoded 
signal including a plurality of subband signals, 

5 a device as claimed in claim 1 for obtaining the output audio signal based on 

the decoded signal, and 

an output unit for supplying the output audio signal. 
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ABSTRACT: 



An oulput audio signal (U R) is generated based on an input audio signal, Hie 
input audio signal comprising a plurality of input subband signals <N). The input subband 
signals are delayed in a plurality of delay units (76) to obtain a plurality of delayed subband 
signals, wherein at least one input subband signal is delayed more than a further input 
subband signal of higher ftequency, and wherein the output audio signal is derived (77) from 
a combination of the input audio signal and the plurality of delayed subband signals. 
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